Sentiment Analysis Using TextBlob and VADER¶
Sentiment analysis is a core NLP task
This notebook demonstrates how to perform sentiment analysis using two popular Python libraries: TextBlob and VADER. Sentiment analysis is a technique used in text mining that determines the attitude or emotion of the writer, such as whether it is positive, negative, or neutral. The dataset consists of 2000 text pieces, evenly split between positive and negative sentiments.
Objectives
- Perform sentiment analysis using TextBlob.
- Perform sentiment analysis using VADER.
- Compare the performance of TextBlob and VADER on the dataset.
Matthew Acs
Setup¶
The "Setup" section of the documentation comprehensively outlines the initial steps required to prepare the Python environment for natural language processing tasks. It includes installing necessary libraries, importing a wide array of modules for data handling and analysis, and managing datasets by downloading and unpacking them appropriately. This section ensures that all the prerequisites, such as library installations and data preparations, are systematically addressed to facilitate smooth execution of further analyses and operations in the project.
Install Required Libraries¶
This section provides the necessary commands to install key Python libraries for natural language processing: TextBlob and VaderSentiment. Using the pip command, the setup ensures that all required dependencies are met or installed, notably for TextBlob and its subsequent installation of VaderSentiment. This preparation is critical for projects involving text analysis and sentiment detection, ensuring that all components are correctly in place for further operations.
!pip install textblob
!pip install vaderSentiment
Requirement already satisfied: textblob in /usr/local/lib/python3.10/dist-packages (0.17.1) Requirement already satisfied: nltk>=3.1 in /usr/local/lib/python3.10/dist-packages (from textblob) (3.8.1) Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk>=3.1->textblob) (8.1.7) Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk>=3.1->textblob) (1.4.0) Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk>=3.1->textblob) (2023.12.25) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk>=3.1->textblob) (4.66.2) Requirement already satisfied: vaderSentiment in /usr/local/lib/python3.10/dist-packages (3.3.2) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from vaderSentiment) (2.31.0) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->vaderSentiment) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->vaderSentiment) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->vaderSentiment) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->vaderSentiment) (2024.2.2)
Import Libraries¶
This subsection handles the importation of various libraries essential for processing and analyzing data. It includes libraries like TextBlob, for processing textual data, and SentimentIntensityAnalyzer from the nltk package for sentiment analysis. Other standard data handling and visualization libraries like pandas, matplotlib, and seaborn are also imported. The process is complemented by downloading necessary nltk datasets such as 'vader_lexicon', 'stopwords', 'punkt', and 'wordnet', ensuring that the environment is fully equipped to handle text data efficiently.
from textblob import TextBlob
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import os
import re
import zipfile
import pandas as pd
import nltk
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
from textblob import TextBlob, Word
from nltk.corpus import stopwords
nltk.download('vader_lexicon')
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
True
Unzip Dataset¶
This final step focuses on preparing the dataset for analysis by downloading and unzipping it. The provided script downloads a sentiment analysis dataset from a GitHub repository and extracts it into the working directory. This operation is important for accessing the dataset, allowing the subsequent data processing and analysis to proceed smoothly. The section details the commands used and the process outcomes, confirming the successful retrieval and readiness of the data for analysis tasks.
# Download the file from GitHub
!wget https://github.com/matthewaaa123/Sentiment_Analysis/raw/main/txt_sentoken.zip
# Unzip the file
with zipfile.ZipFile('txt_sentoken.zip', 'r') as zip_ref:
zip_ref.extractall('./')
--2024-04-17 16:49:43-- https://github.com/matthewaaa123/Sentiment_Analysis/raw/main/txt_sentoken.zip Resolving github.com (github.com)... 140.82.116.3 Connecting to github.com (github.com)|140.82.116.3|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://raw.githubusercontent.com/matthewaaa123/Sentiment_Analysis/main/txt_sentoken.zip [following] --2024-04-17 16:49:44-- https://raw.githubusercontent.com/matthewaaa123/Sentiment_Analysis/main/txt_sentoken.zip Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4845987 (4.6M) [application/zip] Saving to: ‘txt_sentoken.zip.1’ txt_sentoken.zip.1 100%[===================>] 4.62M --.-KB/s in 0.09s 2024-04-17 16:49:44 (52.8 MB/s) - ‘txt_sentoken.zip.1’ saved [4845987/4845987]
Load and Preprocess Dataset¶
This section outlines the procedures for loading and preprocessing a dataset, specifically focusing on text data for sentiment analysis. It details the steps involved in reading text files from designated directories, creating labeled dataframes for different sentiment categories, and then merging and randomizing them to form a unified dataset. Subsequent steps include detailed data cleaning, tokenization, and lemmatization to prepare the text data for analysis. These processes are critical for ensuring the data is in an optimal format for extracting meaningful insights through machine learning techniques.
This code snippet involves defining a function to read text files from specified folders and return a dataframe with text and labels. It then creates separate dataframes for positive and negative sentiment texts and combines them into a single, shuffled dataframe. This process is important for building a balanced dataset that accurately represents various sentiments for analysis.
# Function to read files from a folder and return a dataframe
def create_dataframe(folder_path, label):
data = []
file_names = os.listdir(folder_path)
for file_name in file_names:
file_path = os.path.join(folder_path, file_name)
with open(file_path, 'r') as file:
text = file.read()
data.append({'text': text, 'label': label})
return pd.DataFrame(data)
# Paths to positive and negative folders
pos_folder = '/content/txt_sentoken/pos'
neg_folder = '/content/txt_sentoken/neg'
# Create dataframes for positive and negative reviews
pos_df = create_dataframe(pos_folder, label='positive')
neg_df = create_dataframe(neg_folder, label='negative')
# Concatenate dataframes
labeled_data = pd.concat([pos_df, neg_df], ignore_index=True)
# Shuffle the dataframe
labeled_data = labeled_data.sample(frac=1).reset_index(drop=True)
# Display the first few rows of the dataframe
print("Dataframe")
print("--------------------")
print(labeled_data.head())
Dataframe
--------------------
text label
0 in wonder boys michael douglas plays an aged w... positive
1 there is a rule when it comes to movies . \na ... negative
2 gere , willis , poitier chase each other aroun... positive
3 the comet-disaster flick is a disaster alright... negative
4 because the press screening of " planet of the... positive
This snippet displays the first few rows of the merged and shuffled dataframe. It allows for a quick verification of the data structure and the initial text with labels, providing a glimpse into the dataset's content and ensuring that the data has been correctly loaded and labeled.
first_row_df = labeled_data.iloc[0]
print("First text and label")
print("--------------------")
print()
print(first_row_df['text'])
print()
print(first_row_df['label'])
First text and label -------------------- in wonder boys michael douglas plays an aged writer \ professor with such lived-in naturalism that i believe it may be his best performance . ever since wall street , douglas has spent the greater part of his career playing variations on the shark in a suit gordon gecko character he personified in the mid-80's . in those performances he tended to exaggerate the vehemence of cutthroat businessmen , with much frothing at the mouth while projecting all his bad intentions to the world . you'd think such a man would keep his evil wrapped tightly underneath a good-natured veneer , but from gordon gecko to nicholas van orton , douglas played the role straight and out in the open . in wonder boys his performance isn't showy or a tour de force , it's simple yet truthful . he embodies grady , a craggy old writer with a predilection for pot and pink bathrobes . grady instructs a writers workshop while working tirelessly on a follow up to the novel that put him on the map . when we first encounter this curmudgeon in the midst of his workshop , we hear his sardonic narration on the soundtrack as students bombard one of their own with unfair criticisms . grady points out , in his narration , that they only do so out of jealousy . their target is the very writerly named james leer ( played by the always understated tobey maguire ) , a student full of potential and one whom grady develops a mild affection for . leer is the kind of youth who seems to mechanically block out emotions . he speaks in an intellectualized monotone with just a hint of dry wit around the edges . he's portentous and gloomy , as if modeling himself after the great depressed writers , though his act is a little too calculated . he reminds me of the self-imposed outcast film director , jim jarmusch ( dead man , ghost dog ) . whenever i happen to catch jarmusch in an interview i see the man speaking in a toneless manner ( the monotonous drawl supposedly masking depth or contempt for his interviewer ) , exclusively dressed in black , and with his spiked hair dyed snow white . leer is similar , a guy who equates quirks with depth . tobey maguire fits well in the role . with his round , sweet-eerie face he resembles bud cort from harold and maude . but unlike cort , maguire is easier to warm up to ; he's a messed up kid reaching for artistic credibility . katie holmes plays hannah , a beautiful , talented writing student just itching to get in grady's pants . this is a plot line i had trouble with . douglas , in his old age , is beginning to resemble jerry springer , a man who has actually paid for sex on numerous documented occasions . at first i found it extremely difficult to believe that someone as beautiful as hannah would desire grady ( maybe it's because i'm jealous , and wish holmes was throwing herself at me , after all i may just be a lowly internet critic but at least i still have all my teeth ) , then i think of douglas's real life companion , the breathtaking catherine zeta jones . seeing those two together looks a lot like a kidnapping . suddenly my mind has shifted from the task at hand ( that being reviewing this completely wonderful movie ) and i'm pontificating on why the hell jones would desire douglas . there is a movie in there somewhere . grady , rather chivalrously if you ask me , resists the charms of hannah for sara gaskell ( a droll frances mcdormand ) , who is his age , but also married to another professor . okay , maybe not so chivalrous . there is a great line in the film spoken by douglas about sara where he says , " she was a junkie for the printed word . lucky for me i manufactured her drug of choice " . robert downy j . r plays a bisexual editor who makes his entrance with a towering transvestite on his arm . downy has mastered the gleefully dry hyper articulate wit of many a hipster intellectual . he's arrogant but completely likeable in his utter arrogance . the actor is perfectly cast here , and remains a joyous movie presence somewhere between a typical tom hanksian comic leading man and edgy character actor . i wish wonder boys had more of him . searching for a plot among the elements of wonder boys would be pointless , for it meanders through its running time , but that's part of its charm . and maybe i'm a bit biased towards the film because it takes place in a haven of literary academia , a place i'm greatly fond of , and a place rarely explored in american cinema . everyone has a sub-genre ( be it war films , westerns , dance movies ) that they happen to be privy to . i'm privy to films about literary types i . e . those individuals enthralled by the written word , and if you are not so inclined it may be wise to knock my above grade down about half a notch . the direction by curtis hanson is more akin to a european film with its leisurely pace and situations that grow from the characters , rather than generic mapped out story points . sometimes the dialogue is too clever , but that's a problem i wish i found with films more often . another minor quibble is that early on the film seems a bit too introverted , like its characters , but as the story progresses it begins to open up . for me wonder boys works as subtle drama because of its insight into artistic types , and as a low-key comedy for its chuckle-worthy throwaway gags . the gags are like those in the great robert altman ( m * a * s * h , the long goodbye ) movies , where jokes exist as asides on the fringes , like jokes in life often do . the broader comedy such as the killing of a blind dog , and incessant smoking of marijuana isn't ineffective but not nearly as memorable as the little things . curtis hanson , who before his last film , la confidential , toiled about with exploitation fare like losin' it ( an early tom cruise sex comedy ) and the hand the rocks the cradle , has graduated to more meaningful films . he directs wonder boys in an appropriately dour style , the comedy coming from the false gloom his characters put up . the morose crooning of leonard cohen would seem an odd song for the background of any party , but in a wonder boys party , it fits . the film is like a piece of literature put up on the big screen . it's the cinematic equivalent to a good read , novelistic in its approach with themes rarely found in american movies . many will find it slight , but i found much to savor among its subtleties . positive
This snippet includes functions for cleaning and preprocessing the text data. It strips HTML tags, removes special characters and digits, converts text to lowercase, tokenizes, and lemmatizes the text. The preprocessed text is then reassigned to the dataframe, making it ready for further analysis. This step is vital for normalizing the text data, enhancing the performance of natural language processing tasks by reducing noise and inconsistencies in the dataset.
# Text cleaning function
def clean_text(text):
text = re.sub(r'<.*?>', '', text) # Remove HTML tags
text = re.sub(r'[^a-zA-Z\s]', '', text, re.I|re.A) # Remove special characters and digits
text = text.lower() # Convert to lowercase
return text
# Tokenization function
def tokenize(text):
return TextBlob(text).words
# Lemmatization function using TextBlob's Word
def lemmatize(tokens):
return [Word(word).lemmatize() for word in tokens]
# Function to preprocess text
def preprocess_text(text):
text = clean_text(text)
tokens = tokenize(text)
lemmatized_tokens = lemmatize(tokens)
return ' '.join(lemmatized_tokens)
labeled_data['text'] = labeled_data['text'].apply(preprocess_text)
print("Preprocessed Dataframe")
print("--------------------")
print(labeled_data.head())
Preprocessed Dataframe
--------------------
text label
0 in wonder boy michael douglas play an aged wri... positive
1 there is a rule when it come to movie a sequel... negative
2 gere willis poitier chase each other around th... positive
3 the cometdisaster flick is a disaster alright ... negative
4 because the press screening of planet of the a... positive
Sentiment Analysis with TextBlob¶
This section focuses on implementing sentiment analysis using TextBlob, a straightforward Python library designed for processing textual data. TextBlob offers an accessible API to handle common natural language processing tasks such as tagging, noun phrase extraction, and sentiment analysis. Here, the text data is analyzed to categorize sentiments as positive, neutral, or negative based on the polarity score calculated by TextBlob. This step is important for evaluating the emotional tone of texts, facilitating further analysis like comparing predicted sentiments against actual labels.
This code defines functions to analyze the sentiment of text using TextBlob, calculating a polarity score and classifying the sentiment as positive, neutral, or negative. It then applies these functions to a dataframe, adding columns for binary sentiment (positive, neutral, negative) and numerical polarity score. This process is essential for preparing the data for sentiment-based analysis tasks, allowing a straightforward comparison between computed sentiments and actual labels.
This snippet displays the dataframe augmented with TextBlob sentiment predictions, showcasing the initial few rows to provide insights into the sentiment analysis output. It serves as a practical demonstration of how the sentiment analysis functions are applied to the dataset and the type of results they produce, which is vital for understanding the emotional content of the texts.
# Function to perform sentiment analysis using TextBlob
def analyze_sentiment_binary(text):
blob = TextBlob(text)
sentiment = blob.sentiment
polarity = sentiment.polarity
if polarity > 0:
return 'positive'
elif polarity == 0:
return 'neutral'
else:
return 'negative'
def analyze_sentiment(text):
blob = TextBlob(text)
sentiment = blob.sentiment
polarity = sentiment.polarity
return polarity
# Apply sentiment analysis to the 'text' column of the dataframe
labeled_data['blob_sentiment_binary'] = labeled_data['text'].apply(analyze_sentiment_binary)
labeled_data['blob_sentiment'] = labeled_data['text'].apply(analyze_sentiment)
# Display the dataframe with sentiment analysis results
print("Dataframe with TextBlob predictions")
print("--------------------")
print(labeled_data.head())
Dataframe with TextBlob predictions
--------------------
text label \
0 in wonder boy michael douglas play an aged wri... positive
1 there is a rule when it come to movie a sequel... negative
2 gere willis poitier chase each other around th... positive
3 the cometdisaster flick is a disaster alright ... negative
4 because the press screening of planet of the a... positive
blob_sentiment_binary blob_sentiment
0 positive 0.149617
1 negative -0.047302
2 positive 0.144301
3 positive 0.191110
4 positive 0.124615
The final code snippet focuses on detailed examples by selecting specific rows from the dataset and displaying their text along with both binary sentiment classifications and numerical polarity scores. This detailed view helps to illustrate the effectiveness and nuances of the sentiment analysis performed by TextBlob, giving concrete examples of how sentiment is derived from text and its potential discrepancies or alignment with actual sentiment labels.
first_row_df = labeled_data.iloc[0]
print("Text 1 with TextBlob predictions")
print("--------------------")
print("Text: ", first_row_df['text'])
print()
print("Label: ", first_row_df['label'])
print()
print("Binary Sentiment: ", first_row_df['blob_sentiment_binary'])
print()
print("Sentiment Score: ", first_row_df['blob_sentiment'])
print()
print()
second_row_df = labeled_data.iloc[1]
print("Text 2 with TextBlob predictions")
print("--------------------")
print("Text: ", second_row_df['text'])
print()
print("Label: ", second_row_df['label'])
print()
print("Binary Sentiment: ", second_row_df['blob_sentiment_binary'])
print()
print("Sentiment Score: ", second_row_df['blob_sentiment'])
print()
print()
third_row_df = labeled_data.iloc[2]
print("Text 3 with TextBlob predictions")
print("--------------------")
print("Text: ", third_row_df['text'])
print()
print("Label: ", third_row_df['label'])
print()
print("Binary Sentiment: ", third_row_df['blob_sentiment_binary'])
print()
print("Sentiment Score: ", third_row_df['blob_sentiment'])
Text 1 with TextBlob predictions -------------------- Text: in wonder boy michael douglas play an aged writer professor with such livedin naturalism that i believe it may be his best performance ever since wall street douglas ha spent the greater part of his career playing variation on the shark in a suit gordon gecko character he personified in the mids in those performance he tended to exaggerate the vehemence of cutthroat businessmen with much frothing at the mouth while projecting all his bad intention to the world youd think such a man would keep his evil wrapped tightly underneath a goodnatured veneer but from gordon gecko to nicholas van orton douglas played the role straight and out in the open in wonder boy his performance isnt showy or a tour de force it simple yet truthful he embodies grady a craggy old writer with a predilection for pot and pink bathrobe grady instructs a writer workshop while working tirelessly on a follow up to the novel that put him on the map when we first encounter this curmudgeon in the midst of his workshop we hear his sardonic narration on the soundtrack a student bombard one of their own with unfair criticism grady point out in his narration that they only do so out of jealousy their target is the very writerly named james leer played by the always understated tobey maguire a student full of potential and one whom grady develops a mild affection for leer is the kind of youth who seems to mechanically block out emotion he speaks in an intellectualized monotone with just a hint of dry wit around the edge he portentous and gloomy a if modeling himself after the great depressed writer though his act is a little too calculated he reminds me of the selfimposed outcast film director jim jarmusch dead man ghost dog whenever i happen to catch jarmusch in an interview i see the man speaking in a toneless manner the monotonous drawl supposedly masking depth or contempt for his interviewer exclusively dressed in black and with his spiked hair dyed snow white leer is similar a guy who equates quirk with depth tobey maguire fit well in the role with his round sweeteerie face he resembles bud cort from harold and maude but unlike cort maguire is easier to warm up to he a messed up kid reaching for artistic credibility katie holmes play hannah a beautiful talented writing student just itching to get in gradys pant this is a plot line i had trouble with douglas in his old age is beginning to resemble jerry springer a man who ha actually paid for sex on numerous documented occasion at first i found it extremely difficult to believe that someone a beautiful a hannah would desire grady maybe it because im jealous and wish holmes wa throwing herself at me after all i may just be a lowly internet critic but at least i still have all my teeth then i think of douglas real life companion the breathtaking catherine zeta jones seeing those two together look a lot like a kidnapping suddenly my mind ha shifted from the task at hand that being reviewing this completely wonderful movie and im pontificating on why the hell jones would desire douglas there is a movie in there somewhere grady rather chivalrously if you ask me resists the charm of hannah for sara gaskell a droll france mcdormand who is his age but also married to another professor okay maybe not so chivalrous there is a great line in the film spoken by douglas about sara where he say she wa a junkie for the printed word lucky for me i manufactured her drug of choice robert downy j r play a bisexual editor who make his entrance with a towering transvestite on his arm downy ha mastered the gleefully dry hyper articulate wit of many a hipster intellectual he arrogant but completely likeable in his utter arrogance the actor is perfectly cast here and remains a joyous movie presence somewhere between a typical tom hanksian comic leading man and edgy character actor i wish wonder boy had more of him searching for a plot among the element of wonder boy would be pointless for it meander through it running time but thats part of it charm and maybe im a bit biased towards the film because it take place in a haven of literary academia a place im greatly fond of and a place rarely explored in american cinema everyone ha a subgenre be it war film western dance movie that they happen to be privy to im privy to film about literary type i e those individual enthralled by the written word and if you are not so inclined it may be wise to knock my above grade down about half a notch the direction by curtis hanson is more akin to a european film with it leisurely pace and situation that grow from the character rather than generic mapped out story point sometimes the dialogue is too clever but thats a problem i wish i found with film more often another minor quibble is that early on the film seems a bit too introverted like it character but a the story progress it begin to open up for me wonder boy work a subtle drama because of it insight into artistic type and a a lowkey comedy for it chuckleworthy throwaway gag the gag are like those in the great robert altman m a s h the long goodbye movie where joke exist a aside on the fringe like joke in life often do the broader comedy such a the killing of a blind dog and incessant smoking of marijuana isnt ineffective but not nearly a memorable a the little thing curtis hanson who before his last film la confidential toiled about with exploitation fare like losin it an early tom cruise sex comedy and the hand the rock the cradle ha graduated to more meaningful film he directs wonder boy in an appropriately dour style the comedy coming from the false gloom his character put up the morose crooning of leonard cohen would seem an odd song for the background of any party but in a wonder boy party it fit the film is like a piece of literature put up on the big screen it the cinematic equivalent to a good read novelistic in it approach with theme rarely found in american movie many will find it slight but i found much to savor among it subtlety Label: positive Binary Sentiment: positive Sentiment Score: 0.1496174746174746 Text 2 with TextBlob predictions -------------------- Text: there is a rule when it come to movie a sequel is never a good a the original there are very few exception to this rule and texas chainsaw massacre the next generation is not one of them now if you also take into consideration that the original chainsaw massacre wa a really bad movie and that this isnt even the first sequel to it you have a recipe for a very painful viewing experience dont be fooled by the presence of up and coming talent matthew mcconaughey a time to kill and renee zellweger jerry maguire they made this movie before they were star judging by their performance they also made it before they took any acting lesson it a wonder they ever worked in hollywood again after appearing in this turkey apparently the producer of this film realized just how bad it wa because it sat unreleased for year until someone decided that they might be able to capitalize off the success of mcconaughey and zellweger apparently the two young star were none too happy about this thing ever seeing the light of day and i dont blame them they would have been better off if this had been some sort of porno flick starring the two of them unfortunately for them it is a horror film in which zellweger play your typically stupid horror film character while mcconaughey play a guy who wear a mechanical brace on his leg that he control with a television remote control hey dont say i didnt warn you to make matter worse leatherface the chainsaw wielding maniac who wa never the scariest of psychopathic killer at the best of time ha now become a full blown crossdresser and spends the entire movie in drag there is a plot to this movie but it isnt worth mentioning let just suffice to say that a group of teenager are in the typical wrong place at the wrong time and are left to the mercy of remote control man mcconaughey and his lipstick wearing chainsaw revving halfwitted sidekick man i cant get over just how bad this movie is this film ha absolutely no redeeming quality even the obligatory topless babe shot wasnt enough to hold my interest for more than second the writing is bad the direction is even worse but both of those thing look good in comparison to the acting this is the sort of movie that they should make people in prison watch a guarantee you if criminal thought that they would be subjected to this film they would never break the law again Label: negative Binary Sentiment: negative Sentiment Score: -0.04730158730158728 Text 3 with TextBlob predictions -------------------- Text: gere willis poitier chase each other around the world the jackal a film review by michael redman copyright by michael redman when the soviet union imploded the western country lost their shadow with the united state friendly with the russian we no longer had an entity to blame for the world problem this showed up in hollywood film a the communist government wa no longer the easy bad guy it time to rejoice because weve found our new villain now it no longer the russian government who sends killer out into foreign land it the russian mafia a perfect solution it combine the dread of organized crime and the stillpresent uneasiness with the former eastern block country best of all the villain are still foreigner fear of the other always play best so it is a crime lord in moscow that sends legendary hitman the jackal bruce willis to assassinate a highly placed u government figure in retaliation for the death of his brother during a nightclub raid the fbi is at a loss a to how to protect the target from someone theyre not sure even exists coming to their rescue is former ira operative declan mulqueen richard gere who is temporarily released from prison to assist fbi agent carter preston sidney poitier and russian major valentina koslova diane venora mulqueens exgirlfriend basque terrorist isabella mathilda may is the only person who ha seen the elusive jackal presumably there is an exclusive international terrorist club somewhere where the three met the film follows two parallel track a the jackal prepares for his million hit and mulqueen attempt to locate him while preston make sure that the irishman doesnt slip away crossing numerous border and donning various disguise for both himself and his minivan the killer is always one step ahead of his pursuer being very loosely based on the same book the thriller the day of the jackal comparison between the two film is inevitable there is no doubt that the original is the better movie playing the story for suspense rather than the current actionadventure a a mystery the jackal ha enough hole in it to ruin the tale but if you can accept it for what it is there entertainment to be had hole let see a pivotal clue for mulqueen is so obscure that he must posse psychic power to pick it up for a year veteran that can command the big buck the jackal is an incredibly poor shot the final scene between gere and willis occurs in a location that should be mobbed with police but it just the two of them willis disguise usually look like bruce willis and are just a interesting a val kilmers in the saint and lest you misunderstand thats not a compliment but the three star are fun to watch it good to see gere in something other than a business suit willis ha a mixed history in picking project but his character are always watchable poitier is by far the superior actor but ha limited screen time the problem in logic are flaw but dont ruin the experience occasionally there are movie that transcend their blemish this is one of them the appeared in the bloomington voice bloomington indiana Label: positive Binary Sentiment: positive Sentiment Score: 0.14430139588302854
Sentiment Analysis with VADER¶
This section introduces sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner), a lexicon and rule-based tool specifically designed for sentiment analysis in social media texts. VADER is highly effective for texts that contain slang, emoticons, and other informal expressions commonly found on social media. The section details the implementation of sentiment analysis functions using VADER to assign both a binary sentiment label (positive, neutral, negative) and a numerical polarity score to each piece of text.
This snippet includes the definition of functions to perform sentiment analysis using VADER's sentiment intensity analyzer to compute the compound polarity score. It applies these functions to the dataframe, updating it with new columns for binary sentiment labels and numerical sentiment scores. This approach enables a comprehensive sentiment analysis, important for assessing the emotional tone of texts and understanding their impact on readers.
This code displays the initial results of the sentiment analysis after applying VADER to a dataframe. It showcases the effectiveness of VADER in capturing nuanced emotional expressions in text, making it easier to visualize and validate the sentiment analysis process through actual examples from the dataset.
# Function to perform sentiment analysis using VADER
def analyze_sentiment_binary(text):
analyzer = SentimentIntensityAnalyzer()
polarity = analyzer.polarity_scores(text)['compound']
if polarity > 0:
return 'positive'
elif polarity == 0:
return 'neutral'
else:
return 'negative'
def analyze_sentiment(text):
analyzer = SentimentIntensityAnalyzer()
polarity = analyzer.polarity_scores(text)['compound']
return polarity
# Apply sentiment analysis to the 'text' column of the dataframe
labeled_data['vader_sentiment_binary'] = labeled_data['text'].apply(analyze_sentiment_binary)
labeled_data['vader_sentiment'] = labeled_data['text'].apply(analyze_sentiment)
# Display the dataframe with sentiment analysis results
print("Dataframe with VADER predictions")
print("--------------------")
print(labeled_data.head())
Dataframe with VADER predictions
--------------------
text label \
0 in wonder boy michael douglas play an aged wri... positive
1 there is a rule when it come to movie a sequel... negative
2 gere willis poitier chase each other around th... positive
3 the cometdisaster flick is a disaster alright ... negative
4 because the press screening of planet of the a... positive
blob_sentiment_binary blob_sentiment vader_sentiment_binary \
0 positive 0.149617 positive
1 negative -0.047302 negative
2 positive 0.144301 negative
3 positive 0.191110 positive
4 positive 0.124615 positive
vader_sentiment
0 0.9991
1 -0.9921
2 -0.8652
3 0.1147
4 0.9694
In this final snippet, specific examples from the dataset are highlighted to demonstrate the application of VADER sentiment analysis in detail. Each example includes the text, the actual label, the binary sentiment prediction, and the numerical sentiment score. These detailed outputs illustrate how VADER interprets various textual nuances, providing insights into its capabilities and limitations in real-world applications, such as differentiating subtle tones in complex texts.
first_row_df = labeled_data.iloc[0]
print("Text 1 with VADER predictions")
print("--------------------")
print("Text: ", first_row_df['text'])
print()
print("Label: ", first_row_df['label'])
print()
print("Binary Sentiment: ", first_row_df['vader_sentiment_binary'])
print()
print("Sentiment Score: ", first_row_df['vader_sentiment'])
print()
print()
second_row_df = labeled_data.iloc[1]
print("Text 2 with VADER predictions")
print("--------------------")
print("Text: ", second_row_df['text'])
print()
print("Label: ", second_row_df['label'])
print()
print("Binary Sentiment: ", second_row_df['vader_sentiment_binary'])
print()
print("Sentiment Score: ", second_row_df['vader_sentiment'])
print()
print()
third_row_df = labeled_data.iloc[2]
print("Text 3 with VADER predictions")
print("--------------------")
print("Text: ", third_row_df['text'])
print()
print("Label: ", third_row_df['label'])
print()
print("Binary Sentiment: ", third_row_df['vader_sentiment_binary'])
print()
print("Sentiment Score: ", third_row_df['vader_sentiment'])
Text 1 with VADER predictions -------------------- Text: in wonder boy michael douglas play an aged writer professor with such livedin naturalism that i believe it may be his best performance ever since wall street douglas ha spent the greater part of his career playing variation on the shark in a suit gordon gecko character he personified in the mids in those performance he tended to exaggerate the vehemence of cutthroat businessmen with much frothing at the mouth while projecting all his bad intention to the world youd think such a man would keep his evil wrapped tightly underneath a goodnatured veneer but from gordon gecko to nicholas van orton douglas played the role straight and out in the open in wonder boy his performance isnt showy or a tour de force it simple yet truthful he embodies grady a craggy old writer with a predilection for pot and pink bathrobe grady instructs a writer workshop while working tirelessly on a follow up to the novel that put him on the map when we first encounter this curmudgeon in the midst of his workshop we hear his sardonic narration on the soundtrack a student bombard one of their own with unfair criticism grady point out in his narration that they only do so out of jealousy their target is the very writerly named james leer played by the always understated tobey maguire a student full of potential and one whom grady develops a mild affection for leer is the kind of youth who seems to mechanically block out emotion he speaks in an intellectualized monotone with just a hint of dry wit around the edge he portentous and gloomy a if modeling himself after the great depressed writer though his act is a little too calculated he reminds me of the selfimposed outcast film director jim jarmusch dead man ghost dog whenever i happen to catch jarmusch in an interview i see the man speaking in a toneless manner the monotonous drawl supposedly masking depth or contempt for his interviewer exclusively dressed in black and with his spiked hair dyed snow white leer is similar a guy who equates quirk with depth tobey maguire fit well in the role with his round sweeteerie face he resembles bud cort from harold and maude but unlike cort maguire is easier to warm up to he a messed up kid reaching for artistic credibility katie holmes play hannah a beautiful talented writing student just itching to get in gradys pant this is a plot line i had trouble with douglas in his old age is beginning to resemble jerry springer a man who ha actually paid for sex on numerous documented occasion at first i found it extremely difficult to believe that someone a beautiful a hannah would desire grady maybe it because im jealous and wish holmes wa throwing herself at me after all i may just be a lowly internet critic but at least i still have all my teeth then i think of douglas real life companion the breathtaking catherine zeta jones seeing those two together look a lot like a kidnapping suddenly my mind ha shifted from the task at hand that being reviewing this completely wonderful movie and im pontificating on why the hell jones would desire douglas there is a movie in there somewhere grady rather chivalrously if you ask me resists the charm of hannah for sara gaskell a droll france mcdormand who is his age but also married to another professor okay maybe not so chivalrous there is a great line in the film spoken by douglas about sara where he say she wa a junkie for the printed word lucky for me i manufactured her drug of choice robert downy j r play a bisexual editor who make his entrance with a towering transvestite on his arm downy ha mastered the gleefully dry hyper articulate wit of many a hipster intellectual he arrogant but completely likeable in his utter arrogance the actor is perfectly cast here and remains a joyous movie presence somewhere between a typical tom hanksian comic leading man and edgy character actor i wish wonder boy had more of him searching for a plot among the element of wonder boy would be pointless for it meander through it running time but thats part of it charm and maybe im a bit biased towards the film because it take place in a haven of literary academia a place im greatly fond of and a place rarely explored in american cinema everyone ha a subgenre be it war film western dance movie that they happen to be privy to im privy to film about literary type i e those individual enthralled by the written word and if you are not so inclined it may be wise to knock my above grade down about half a notch the direction by curtis hanson is more akin to a european film with it leisurely pace and situation that grow from the character rather than generic mapped out story point sometimes the dialogue is too clever but thats a problem i wish i found with film more often another minor quibble is that early on the film seems a bit too introverted like it character but a the story progress it begin to open up for me wonder boy work a subtle drama because of it insight into artistic type and a a lowkey comedy for it chuckleworthy throwaway gag the gag are like those in the great robert altman m a s h the long goodbye movie where joke exist a aside on the fringe like joke in life often do the broader comedy such a the killing of a blind dog and incessant smoking of marijuana isnt ineffective but not nearly a memorable a the little thing curtis hanson who before his last film la confidential toiled about with exploitation fare like losin it an early tom cruise sex comedy and the hand the rock the cradle ha graduated to more meaningful film he directs wonder boy in an appropriately dour style the comedy coming from the false gloom his character put up the morose crooning of leonard cohen would seem an odd song for the background of any party but in a wonder boy party it fit the film is like a piece of literature put up on the big screen it the cinematic equivalent to a good read novelistic in it approach with theme rarely found in american movie many will find it slight but i found much to savor among it subtlety Label: positive Binary Sentiment: positive Sentiment Score: 0.9991 Text 2 with VADER predictions -------------------- Text: there is a rule when it come to movie a sequel is never a good a the original there are very few exception to this rule and texas chainsaw massacre the next generation is not one of them now if you also take into consideration that the original chainsaw massacre wa a really bad movie and that this isnt even the first sequel to it you have a recipe for a very painful viewing experience dont be fooled by the presence of up and coming talent matthew mcconaughey a time to kill and renee zellweger jerry maguire they made this movie before they were star judging by their performance they also made it before they took any acting lesson it a wonder they ever worked in hollywood again after appearing in this turkey apparently the producer of this film realized just how bad it wa because it sat unreleased for year until someone decided that they might be able to capitalize off the success of mcconaughey and zellweger apparently the two young star were none too happy about this thing ever seeing the light of day and i dont blame them they would have been better off if this had been some sort of porno flick starring the two of them unfortunately for them it is a horror film in which zellweger play your typically stupid horror film character while mcconaughey play a guy who wear a mechanical brace on his leg that he control with a television remote control hey dont say i didnt warn you to make matter worse leatherface the chainsaw wielding maniac who wa never the scariest of psychopathic killer at the best of time ha now become a full blown crossdresser and spends the entire movie in drag there is a plot to this movie but it isnt worth mentioning let just suffice to say that a group of teenager are in the typical wrong place at the wrong time and are left to the mercy of remote control man mcconaughey and his lipstick wearing chainsaw revving halfwitted sidekick man i cant get over just how bad this movie is this film ha absolutely no redeeming quality even the obligatory topless babe shot wasnt enough to hold my interest for more than second the writing is bad the direction is even worse but both of those thing look good in comparison to the acting this is the sort of movie that they should make people in prison watch a guarantee you if criminal thought that they would be subjected to this film they would never break the law again Label: negative Binary Sentiment: negative Sentiment Score: -0.9921 Text 3 with VADER predictions -------------------- Text: gere willis poitier chase each other around the world the jackal a film review by michael redman copyright by michael redman when the soviet union imploded the western country lost their shadow with the united state friendly with the russian we no longer had an entity to blame for the world problem this showed up in hollywood film a the communist government wa no longer the easy bad guy it time to rejoice because weve found our new villain now it no longer the russian government who sends killer out into foreign land it the russian mafia a perfect solution it combine the dread of organized crime and the stillpresent uneasiness with the former eastern block country best of all the villain are still foreigner fear of the other always play best so it is a crime lord in moscow that sends legendary hitman the jackal bruce willis to assassinate a highly placed u government figure in retaliation for the death of his brother during a nightclub raid the fbi is at a loss a to how to protect the target from someone theyre not sure even exists coming to their rescue is former ira operative declan mulqueen richard gere who is temporarily released from prison to assist fbi agent carter preston sidney poitier and russian major valentina koslova diane venora mulqueens exgirlfriend basque terrorist isabella mathilda may is the only person who ha seen the elusive jackal presumably there is an exclusive international terrorist club somewhere where the three met the film follows two parallel track a the jackal prepares for his million hit and mulqueen attempt to locate him while preston make sure that the irishman doesnt slip away crossing numerous border and donning various disguise for both himself and his minivan the killer is always one step ahead of his pursuer being very loosely based on the same book the thriller the day of the jackal comparison between the two film is inevitable there is no doubt that the original is the better movie playing the story for suspense rather than the current actionadventure a a mystery the jackal ha enough hole in it to ruin the tale but if you can accept it for what it is there entertainment to be had hole let see a pivotal clue for mulqueen is so obscure that he must posse psychic power to pick it up for a year veteran that can command the big buck the jackal is an incredibly poor shot the final scene between gere and willis occurs in a location that should be mobbed with police but it just the two of them willis disguise usually look like bruce willis and are just a interesting a val kilmers in the saint and lest you misunderstand thats not a compliment but the three star are fun to watch it good to see gere in something other than a business suit willis ha a mixed history in picking project but his character are always watchable poitier is by far the superior actor but ha limited screen time the problem in logic are flaw but dont ruin the experience occasionally there are movie that transcend their blemish this is one of them the appeared in the bloomington voice bloomington indiana Label: positive Binary Sentiment: negative Sentiment Score: -0.8652
Comparison of TextBlob and VADER¶
This section evaluates and compares the performance of two popular sentiment analysis tools, TextBlob and VADER, using a dataset containing labeled text data. It involves calculating accuracy, plotting sentiment distributions, analyzing error rates, and examining confusion matrices for both methods. These analyses help to identify the strengths and weaknesses of each tool in the context of sentiment analysis.
This snippet calculates the accuracy of sentiment predictions made by TextBlob and VADER compared to true labels. Accuracy is a fundamental metric that provides a straightforward evaluation of how often the sentiment analysis predictions are correct. TextBlob achieved an accuracy of 59.6%, while VADER achieved slightly higher accuracy at 62.4%.
# Function to compute accuracy
def compute_accuracy(predictions, true_labels):
correct_predictions = (predictions == true_labels).sum()
total_predictions = len(predictions)
accuracy = correct_predictions / total_predictions
return accuracy
# Compute accuracy
accuracy = compute_accuracy(labeled_data['blob_sentiment_binary'], labeled_data['label'])
print("TextBlob Accuracy:", accuracy)
print()
# Compute accuracy
accuracy = compute_accuracy(labeled_data['vader_sentiment_binary'], labeled_data['label'])
print("VADER Accuracy:", accuracy)
TextBlob Accuracy: 0.596 VADER Accuracy: 0.624
This code visualizes the sentiment distribution and error types for TextBlob and VADER. It provides a graphical representation of how many predictions fall into each sentiment category and overlays the count of incorrect predictions (false positives and negatives). These plots highlight the comparative error rates and give a visual summary of each method's effectiveness.
# Setting the style of the plots
sns.set(style="whitegrid")
def calculate_false_positives(predictions, true_labels):
"""Calculate the number of false positive errors."""
return ((predictions == 'positive') & (true_labels == 'negative')).sum()
def calculate_false_negatives(predictions, true_labels):
"""Calculate the number of false negative errors."""
return ((predictions == 'negative') & (true_labels == 'positive')).sum()
def calculate_error_rate(predictions, true_labels):
"""Calculate the percentage of incorrect predictions."""
incorrect_predictions = (predictions != true_labels).sum()
total_predictions = len(predictions)
return (incorrect_predictions / total_predictions) * 100
def plot_sentiment_distribution(dataframe, method):
"""Plot sentiment distribution and error types for a given sentiment analysis method."""
sentiment_counts = dataframe[method].value_counts()
false_positives = calculate_false_positives(dataframe[method], dataframe['label'])
false_negatives = calculate_false_negatives(dataframe[method], dataframe['label'])
error_rate = calculate_error_rate(dataframe[method], dataframe['label'])
plt.figure(figsize=(10, 8))
# Plotting sentiment counts
bars = plt.bar(sentiment_counts.index, sentiment_counts.values, color='tab:blue', label='Prediction Count')
# Overlaying false rates on appropriate bars
if 'positive' in sentiment_counts:
plt.bar('positive', false_positives, color='white', alpha=0.5, label='Incorrect')
if 'negative' in sentiment_counts:
plt.bar('negative', false_negatives, color='white', alpha=0.5)
plt.title(f'Sentiment Distribution using {method} (Error Rate: {error_rate:.2f}%)')
plt.xlabel('Sentiment Type')
plt.ylabel('Count')
plt.legend()
plt.xticks(rotation=45)
# Add counts above bars
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval, int(yval), ha='center', va='bottom', fontsize=10, color='black')
plt.tight_layout()
plt.show()
plot_sentiment_distribution(labeled_data, 'vader_sentiment_binary')
plot_sentiment_distribution(labeled_data, 'blob_sentiment_binary')
This snippet generates confusion matrices for both TextBlob and VADER. A confusion matrix is a useful tool for understanding the types of errors made by a classification system (e.g., how many positive sentiments were incorrectly labeled as negative and vice versa). This visual comparison helps in assessing the precision and recall of each sentiment analysis method.
# Compute confusion matrix for VADER
vader_conf_matrix = confusion_matrix(labeled_data['label'], labeled_data['vader_sentiment_binary'], labels=['positive', 'negative'])
# Compute confusion matrix for TextBlob
textblob_conf_matrix = confusion_matrix(labeled_data['label'], labeled_data['blob_sentiment_binary'], labels=['positive', 'negative'])
# Plot confusion matrix for VADER
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
sns.heatmap(vader_conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['positive', 'negative'], yticklabels=['positive', 'negative'])
plt.title('Confusion Matrix for VADER')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
# Plot confusion matrix for TextBlob
plt.subplot(1, 2, 2)
sns.heatmap(textblob_conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['positive', 'negative'], yticklabels=['positive', 'negative'])
plt.title('Confusion Matrix for TextBlob')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.tight_layout()
plt.show()
The final snippet plots the raw polarity scores generated by TextBlob and VADER against their respective data point numbers. This visualization helps in observing the spread and bias of polarity scores given by each tool. It also illustrates the alignment of polarity scores with actual labels, using color coding to show correct and incorrect predictions. This plot is important for analyzing how each sentiment analysis tool scores texts of varying sentiment intensities.
# Function to plot raw polarity scores vs. data point number
def plot_raw_scores(dataframe, method):
plt.figure(figsize=(10, 5))
plt.scatter(range(len(dataframe)), dataframe[method], c=['tab:green' if pred == true_label else 'tab:red' for pred, true_label in zip(dataframe[method+"_binary"], dataframe['label'])])
plt.xlabel('Data Point Number')
plt.ylabel('Raw Polarity Score')
plt.title(f'Raw Polarity Scores vs. Data Point Number ({method})')
plt.axhline(0, color='black', linestyle='--', linewidth=0.5) # Add horizontal line at y=0
plt.show()
# Plot raw polarity scores for TextBlob
plot_raw_scores(labeled_data, 'blob_sentiment')
# Plot raw polarity scores for VADER
plot_raw_scores(labeled_data, 'vader_sentiment')
Together, these analyses provide a comprehensive comparison of TextBlob and VADER, highlighting their performance on the same dataset and offering insights into which tool might be more suitable for specific types of text or applications.
Conclusion¶
The comparison of TextBlob and VADER for sentiment analysis revealed insightful distinctions in their performance and applicability, particularly in handling labeled sentiment data. Both tools have demonstrated their utility, but they show different strengths and weaknesses depending on the text characteristics and the context of the sentiment analysis.
VADER outperformed TextBlob in overall accuracy, scoring 62.4% compared to TextBlob’s 59.6%. This suggests that VADER may be slightly better at handling the nuances of the dataset used, possibly due to its lexicon specifically tuned for the informal language found in social media.
Plots of sentiment distribution highlighted differences in error rates between the two methods. The plots indicated that VADER had a more balanced approach in sentiment classification across the dataset.
The confusion matrices further supported these findings, where VADER showed a better balance in correctly identifying both positive and negative sentiments, indicating a stronger capability in distinguishing between sentiment polarities.
The visualization of raw polarity scores was particularly revealing. It showed how each tool assigns sentiment values to text data, with VADER displaying a broader differentiation in the polarity scores. This could be indicative of VADER's sensitivity to different intensities of sentiment expressed in text, likely benefiting from its rule-based approach that considers modifiers and intensifiers commonly used in social media. Furthermore, it showed that data points that were assigned more neutral polarity scores were more likely to be incorrectly classified.
These results suggest that while TextBlob is a straightforward and useful tool for general sentiment analysis, VADER's specialized approach makes it more effective for texts that contain varied expressions of sentiment, such as those found in social media contexts (such as movie reviews like in this dataset). The choice between TextBlob and VADER should therefore be guided by the specific requirements of the sentiment analysis task, particularly the type of text being analyzed. For more formal texts, TextBlob could be sufficient, but for dynamic and informal texts, VADER might offer more accurate insights Overall, the evaluation highlights the importance of choosing the right tool for sentiment analysis based on the text's characteristics and the context in which the tool will be used. Both TextBlob and VADER provide valuable insights into text sentiment, but their effectiveness can vary significantly depending on the specific application and data characteristics.
References¶
This section contains the references that I used to create this notebook.
- TextBlob Demo: FAU Media, YouTube
- TextBlob Tutorial: DataScienceLearner
- VADER: YouTube
- Comparing TextBlob and VADER: YouTube 1, YouTube 2
- Title Image: Sentiment Analysis